Speech Synthesis

The Best 610 Speech Synthesis Tools in 2025

Kokoro is an open-source text-to-speech (TTS) model with 82 million parameters, renowned for its lightweight architecture and high audio quality, while also being fast and cost-effective.

Speech Synthesis English

ⓍTTS is a revolutionary voice generation model that achieves cross-lingual voice cloning with just a 6-second audio clip, supporting 17 languages.

Speech Synthesis

F5-TTS is a flow matching-based voice synthesis model, focusing on fluent and faithful voice synthesis, especially suitable for scenarios like fairy tale narration.

Speech Synthesis

Bigvgan V2 22khz 80band 256x

BigVGAN is a general-purpose neural vocoder trained at scale, capable of generating high-quality audio waveforms from mel spectrograms.

Speech Synthesis

A SpeechT5 speech synthesis (text-to-speech) model fine-tuned on the LibriTTS dataset, supporting high-quality text-to-speech conversion.

Speech Synthesis

Dia is a 1.6 billion parameter text-to-speech model developed by Nari Labs, capable of generating highly realistic conversations directly from text, supporting emotional and tonal control, and producing non-verbal communication content.

Speech Synthesis

Safetensors English

CSM is a 1-billion-parameter voice generation model developed by Sesame, capable of generating RVQ audio encoding from text and audio inputs

Speech Synthesis English

Kokoro 82M V1.1 Zh

Kokoro is an open-weight series of small yet powerful text-to-speech (TTS) models, now featuring data from 100 Chinese speakers sourced from professional datasets.

Speech Synthesis

Indic Parler Tts

Indic Parler-TTS is a multilingual extension of Parler-TTS Mini, supporting 21 languages including various Indian languages and English.

Speech Synthesis

Transformers Supports Multiple Languages

Bark is a Transformer-based text-to-audio model created by Suno, capable of generating highly realistic multilingual speech, music, background noise, and simple sound effects.

Speech Synthesis

Transformers Supports Multiple Languages

F5-TTS is a fully non-autoregressive zero-shot text-to-speech model that supports high-quality speech synthesis.

Speech Synthesis

XCodec2 is a voice tokenizer supporting multilingual voice semantic understanding and high-quality voice reconstruction

Speech Synthesis

Parler Tts Large V1

A 2.2 billion parameter text-to-speech model trained on 45,000 hours of audio data, supporting voice feature control via text prompts

Speech Synthesis

Transformers English

English text-to-speech model developed by Meta, based on the VITS architecture, supporting high-quality speech synthesis

Speech Synthesis

Bark is a Transformer-based multilingual text-to-audio model developed by Suno, capable of generating realistic speech, music, and non-verbal sounds

Speech Synthesis

Transformers Supports Multiple Languages

A Yoruba text-to-speech model developed by Meta, based on the VITS architecture for high-quality speech synthesis

Speech Synthesis

Parler Tts Mini V1

Lightweight text-to-speech model trained on 45,000 hours of audio, supporting voice characteristic control via text prompts

Speech Synthesis

Transformers English

Orpheus 3b 0.1 Ft Q4 K M GGUF

Orpheus-TTS is a lightweight text-to-speech model that supports local operation, providing high-quality speech synthesis capabilities.

Speech Synthesis English

This is an RVC (Retrieval-based Voice Conversion) model designed for audio-to-audio tasks, capable of converting input audio into output audio with a specific style.

Speech Synthesis

Homersimpson2333333

This is a voice conversion model based on RVC (Retrieval-Based Voice Conversion) technology, capable of transforming input audio into the voice style of Homer Simpson.

Speech Synthesis

Freddie Mercury RVC 700 Epochs

This is a voice conversion model based on RVC (Retrieval-based Voice Conversion) technology, trained for 700 epochs, capable of converting input audio into Freddie Mercury-style speech.

Speech Synthesis

Lana Del Rey E1000 S13000

This is a voice conversion model based on RVC (Retrieval-based Voice Conversion) technology, capable of converting input audio into speech with a specific style.

Speech Synthesis

Adele RVC 400 Epochs

This is a voice conversion model based on RVC (Retrieval-based Voice Conversion) technology, trained for 400 rounds, capable of converting input audio into output audio that mimics Adele's vocal timbre.

Speech Synthesis

This is an audio-to-audio conversion model based on the RVC architecture, specifically designed for processing XXXTentacion-style voice conversion.

Speech Synthesis

Xphonebert Base

XPhoneBERT is the first multilingual phoneme representation pretraining model for text-to-speech (TTS), based on the BERT-base architecture and trained with 330 million phoneme-level sentences across nearly 100 languages.

Speech Synthesis

IndicF5 is a near-human multilingual text-to-speech (TTS) model trained on 1,417 hours of high-quality speech data, supporting 11 Indian languages.

Speech Synthesis Other

This is a voice conversion model based on RVC (Retrieval-based Voice Conversion) technology, capable of transforming input audio into Michael Jackson-style speech.

Speech Synthesis

This is a voice conversion model based on RVC (Retrieval-based Voice Conversion) technology, capable of converting source speech into a target voice style.

Speech Synthesis

Eminem E600 S5400

This is a voice conversion model based on RVC (Retrieval-Based Voice Conversion) technology, capable of transforming input audio into speech output with a specific style.

Speech Synthesis

ⓍTTS is a voice generation model that can clone voices and apply them to different languages with just a 6-second audio clip.

Speech Synthesis

Parler Tts Mini V0.1

Parler-TTS Mini is a lightweight text-to-speech model trained on 10.5K hours of audio data, supporting voice feature control through text prompts.

Speech Synthesis

Transformers English

Ariana Grande RVC V1

This is a voice conversion model based on RVC (Retrieval-Based Voice Conversion) technology, capable of transforming input audio into Ariana Grande-style speech.

Speech Synthesis

Fish Speech V1.5 is a leading text-to-speech (TTS) model trained on over 1 million hours of multilingual audio data.

Speech Synthesis Supports Multiple Languages

CSM is a 1B-parameter speech generation model developed by Sesame, capable of generating RVQ audio codes from text and audio inputs, supporting context-aware speech generation.

Speech Synthesis English

Drake_RVC is an audio-to-audio model based on RVC (Retrieval-based Voice Conversion) technology, specifically designed for voice conversion tasks.

Speech Synthesis

HiFiGAN is a Generative Adversarial Network (GAN) model capable of generating high-quality audio from mel-spectrograms, suitable for text-to-speech systems.

Speech Synthesis English

This is an RVC (Retrieval-based Voice Conversion) model designed for audio-to-audio conversion tasks.

Speech Synthesis

This is a voice conversion model based on RVC (Retrieval-based Voice Conversion) technology, capable of transforming input audio into output audio that mimics Billie Eilish's voice.

Speech Synthesis

Tts En Fastpitch

FastPitch is a fully parallel Transformer-based text-to-speech model capable of controlling pitch and phoneme duration, generating high-quality American English speech.

Speech Synthesis English

A French text-to-speech model developed by Meta, based on the VITS architecture, supporting high-quality speech synthesis

Speech Synthesis

This is an audio conversion model based on RVC (Retrieval-Based Voice Conversion) technology, specifically designed to transform input audio into Justin Bieber's vocal style.

Speech Synthesis

Frank Sinatra 51600 Steps 250 Epochs RVC

This is an audio-to-audio conversion model based on the RVC framework, specifically designed for voice conversion tasks.

Speech Synthesis

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase